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CLUSTER  ANALYSIS  AND  ITS  APPLICATIONS  IN  MARKETING  RESEARCH 

Lawrence  Sherman  and  Jagdish  N.  Sheth* 
University  of  Illinois 

Introduction 

Classification  is  the  identification  of  an  observation  and  its 
placement  into  a  homogeneous  group  based  on  observed  characteristics. 
When  we  are  able  to  a  priori  specify  the  groups,  multiple  discriminant 
analysis  (MDA)  provides  an  analytical  method  to  derive  classification 
functions.   See  Anderson  (1958)  for  an  excellent  discussion  of  classi- 
fication procedures  based  on  the  linear  discriminant  function.  However, 
many  times  in  marketing  research  a  priori  specification  of  groups  is 
impossible  due  to  lack  of  formal  theory  and  the  researcher  must  choose 
for  his  analysis  some  of  the  heuristic,  probabilistic,  or  combinational 
algorithms  that  have  been  proposed  to  deal  with  such  situations.  While 
it  is  virtually  impossible  to  describe  all  clustering  procedures  in  this 
paper,  Frank  and  Green  (1968),  Anderberg  (1973),  Bijnen  (1973),  and 
Cormack  (1971)  provide  a  good  starting  place  for  a  basic  introduction 
to  clustering  multivariate  observation. 

An  assumption  underlying  the  use  of  cluster  analysis  is  that  homo- 
geneous subgroups  or  clusters  actually  exist  in  the  data.  The  basic 
problem  in  cluster  analysis  Is  to  devise  an  algorithm  that  reduces  the 
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sorting  of  an  entity  into  g  groups  based  on  a  profile  of  p  attributes. 
When  g  is  unknown,  the  number  of  possibilities  of  sorting  n  observations 
is 

I     f  <j)  (1) 

where  f  is  a  Stirling  number  of  the  second  kind  and  is  defined  by 
Abramowitz.and  Stegan  (1964)  as 

f(8)  -  -4-   f  C-D8~k  (?)  kn  (2) 

n       g*   k-0       K 

Complete  enumeration  of  f  ***  is  impractical  as  a  method  of  sorting  each 
observation  into  groups.  In  fact,  it  would  probably  be  difficult  to 
differentiate  the  correct  cluster  from  such  a  large  number  of  clusters. 
Therefore,  some  heuristic  or  optimal  rule  must  be  designed  which  will 
make  the  task  manageable  and  meaningful. 

This  paper  discusses  some  of  the  problems  and  decisions  in  the 
application  of  clustering  methods,  reviews  some  recent  marketing  ap- 
plications and  concludes  by  stressing  the  problems  implicit  in  cluster 
analysis. 

Decisions... in  Cluster  Analysis 

Human  judgment  is  the  single  most  important  factor  in  the  genera- 
tion of  meaningful  clustering  results.  Major  decisions  facing  the 
analyst  can  be  stated  as: 

(1)  How  do  ws  select  a  similarity  measure  which  will  index 
a  profile  vector  in  order  to  make  comparisons  among 

entities? 

(2)  How  do  we  compute  the  clusters? 


(3)  How  do  we  determine  the  number  of  clusters  In  the  data? 

(A)  How  do  we  design  the  research  strategy? 

(5)  Can  the  clusters  by  quantitatively  and  meaningfully 
justified? 

It  should  be  made  clear  that  each  decision  must  be  made  on  the  basis 
of  sound  criteria  nurtured  by  the  research  problem. 


The  Similarity  Matrix 

To  convert  a  profile  vector  of  an  observation  into  a  similarity  index, 
it  is  critical  to  know  the  type  of  measurement  utilized  in  the  research 
problems.  The  classical  classification  of  scales  are  provided  by  Stevens 
(1951)  and  Torgerson  (1958)  and  summarized  below. 

The  Four  Basic  Scales 


No  Natural 
Origin 


Natural 
Origin 


No  Distance 


Distance 


p — - —  ■■ 
Nominal 

Ordinal 

Interval 

Ratio 

Nominal  scaled  data  refers  to  a  numbering  of  the  observations  where 
measurement  does  not  connotate  properties  of  the  observation.  Ordinal 
data  indicates  a  serial  ordering  of  the  entities  such  that  the  numbers 
are  determined  within  a  monotonia  increasing  or  decreasing  transformation. 
When  numbers  are  assigned  to  reflect  different  amounts  of  a  given  property 
between  objects,  the  data  is  said  to  be  interval  scaled.  Ratio  scaled 
data  has  the  property  of  interval  scaled  data,  plus  a  natural  origin  de- 
fined by  the  measurement.  The  measures  can  be  further  classified  as 
continuous,  discrete,  or  dichotomous — reflecting  presence  or  absence  of 
the  phenomenon. 


Making  comparisons  between  entities  depends  on  the  similarity  measurs 
that  is  defined.   Similarity  measures  are  of  two  types — distance  measures 
and  association  (proximity)  measures.   Selection  of  the  similarity  measur< 
depends  on  the  scale  utilized  in  the  data.  Use  of  the  distance  measure 
requires  the  specification  of  a  metric  of  measurement.  The  metric  has 
the  formal  properties  of 

D(X,Y)  -  0,  if  X«Y 
D(X,Y)  >  0 
D(X,Y)  -  D(Y,X) 
D(X,Y)  <  D(X,Z)  +  D(Y,Z) 

for  X,  Y,  and  Z  in  a  metric  space.  The  fourth  property  is  the  familiar 
triangular  inequality. 

In  order  to  develop  a  similarity  index  between  two  entities  based 
on  the  distance  measure,  the  most  generalized  theorem  is  the  Minkowski's 
constant  X,  defined  as: 


d« 


where  we  define 


J^tfctt- V)XJ  1A     (3) 


i,  j  are  the  subscripts  for  entity  i  and  j 

d  represents  the  distance  measure 

X.,  is  the  projection  of  entity  X.  on  orthogonal  axis  k 

W,  is  1  for  unweighted  distances 

p  is  the  number  of  axis  of  the  space 

X  is  the  metric  of  the  space 

When  X»2,  the  similarity  metric  is  the  familiar  Euclidean  distance.  Con- 
ceptually  each  entity  can  be  viewed  as  a  point  in  p-dimensional  Euclidean 


space.  The  closer  the  distance  the  more  similar  the  entities;  the  farther 
the  distance  the  more  dissimilar  the  entities.   This  is  the  most  often  used 
distance  measure  in  cluster  analysis.  However,  Attneave  (1950)  has  proposed 
the  City-Block  metric  to  deal  with  certain  perceptual  situations  which  has 
been  used  by  Johnson  and  Wall  (1969)  in  a  clustering  solution.  Although 
not  developed  in  this  paper,  and  often  forgotten  in  marketing  applications, 
the  selection  of  X  in  the  metric  measurement  model  imposes  a  structure  on 
the  data.  When  scale  of  measurement  among  entities  are  different  and  con- 
tain no  intrinsic  information,  W,  can  be  used  to  scale  variable  k 

*  k 

2 
(i.e.  W,  ■  1/S,  ).  With  highly  correlated  variables  the  variable  config- 
uration and  the  orthogonal  axis  of  the  metric  space  do  not  correspond. 
Green  and  Rao  (1969)  discuss  a  method  of  dealing  with  redundancy  in  the 
data  by  computing  distances  in  principal  component  space.  This  is  equiv- 
alent to  the  Mahalanobis  generalized  distance  (Morrison  1967): 

D2  =  (X^X)1  W""1  (X±-X)  (4) 

where 

X.  is  the  ith  observation  vector 

1   n 
X  is  Z     x. ,  and 

i»l   J 

g   n 

W  is  2   £  (X..-X)  (X..-X.) 

j-1  1-1   XJ      ^     3 

When  individual  differences  in  perceptions  are  expected  in  cluster  analysis, 
a  "modified"  Euclidean  distance  measure 


djki 


P  2 

*  (wiajm  "  wimakm) 


1/2   (5) 


where  distance,  d, .,  ,  is  measured  in  a  p- dimensional  space  between  attribute 
j  and  k  for  entity  i.  The  weight  w.  is  given  to  axis  m  by  entity  i  and  a, 
is  the  projection  of  attribute  j  on  axis  m.   This  measure  is  discussed  by 
Horan  (1969),  and  Carroll  and  Chang  (1970)  for  the  study  of  individual 
differences  in  multidimensional  scaling.   Bloxom  (1974)  suggests  that  it 
is  a  special  case  of  the  ACOVS  procedure. 

Proximity  measures  or  measures  of  association  depend  mainly  on  the 
level  of  measurement  of  the  data.  When  data  is  interval  scaled ,  the  cross- 
products  matrix  V,  the  variance-covariance  matrix  C,  and  the  correlation 
matrix  R  has  been  mainly  used  in  marketing  research.  Given  a  data  matrix 
X 

V  «  X'X  (6) 

C  -  —-     (V-nX'X)  (7) 

R  -  S^CS*"1  where  S  -  (Diag  C)  i/2  (8) 

C  implies  that  level  of  measurement  is  unimportant  since  it  is  subtracted 
out.  When  R  is  used,  scale  of  measurement  and  level  of  measurement  are 
assumed  unimportant  since  R  is  scaled  by  the  standard  deviations  of  each 
variable.  Fleiss  and  Zubin  (1936)  argue  against  the  use  of  standardized 
data  because  they  contend  that  scaling  should  be  done  on  the  clusters  and 
not  on  the  data  matrix  X.   Implicit  in  the  use  of  V,  C,  and  R  is  a  linear 
structure;  Lehman  (1974)  has  applied  a  non-linear  correlation  measure  to 
marketing  data  in  examining  methods  of  grouping.  Numerous  other  associa- 
tion measures  have  been  proposed  for  specific  purposes. 

When  data  are  in  binary  form,  the  matching  coefficient  is  a  useful 
method  of  computing  measures  of  association.  Given  a  contingency  table 
and 


1 

1 

0 

Total 

1 

a 

b 

a  +  b 

i 

0 

c 

d 

c  4-  d 

a  -*-  c     b+d     s  +  b 

+  c  +  d 


allowing  1  to  indicate  presence  and  0  to  indicate  absence,  the  Rogers- 
Tanimato  coefficient 


a  +  d 


a  +  b  +  2  (b  +  c) 


(9) 


has  been  the  most  referenced,  but  by  no  means  the  only  matching  type 
coefficient.  Co^xoask.  (1971)  provides  a  discussion  of  various  similarity 
measures,  based  on  coefficients  of  association.  When  data  consists  of 
mixed  scales,  the  choice  of  a  meaningful  similarity  measure  becomes 
troublesome.  Perhaps  the  best  advice  is  that  the  cluster  analyst  use 
foresight  in  the  collection  of  his  data  to  avoid  mixed  scale  transfor- 
mations.  To  facilitate  the  selection  of  a  similarity  measure,  Table  1 
lists  selected  formulae  and  appropriate  references. 


Please  Insert  Table  1  about  here 


With  nonmetric  data,  Green  (17)  suggests  that  multidimensional  scaling 
be  carried  out  to  bring  out  the  metric  qualities  of  the  data  and  cluster 
analysis  be  performed  on  the  configuration  by  computing  distances  in  the 
derived  space.  While  only  experimental  results  are  reported,  the  method 
appears  promising  for  marketing  data. 


The  Clustering  Algorithm 

The  second  major  decision  facing  the  cluster  analyst  is  the  choice  of 
the  clustering  algorithm.   Selection  ,f  the  clustering  algorithm  must  be 
made  on  the  basis  of  anticipated  properties  of  the  clusters.  Only  re- 
cently has  mathematical  analysis  been  applied  to  provide  a  theoretical 
basis  for  clustering.   Two  basic  methods  of  generating  clusters  of  entities 
exist-hierarchical  cluster  analysis  and  non-hierarchical  cluster  analysis. 
In  this  section,  references  and  a  classification  table  of  selected  cluster- 
ing methods  are  given,  For  the  interested  reader's  benefit,  it  should  be 
noted  that  most  papers  describe  the  author's  prescribed  method.  Though 
this  list  is  long,  It  is  incomplete,  but  forms  a  basic  core  of  readings 
from  which  additional  references  can  be  easily  obtained. 


Please  insert  Table  2  about  here 


The  term  hierarchical  refers  to  the  method  of  cluster  analysis  that 
starts  with  a  strong  cluster  (i.e.,  each  entity  is  a  separate  cluster)  and 
on  the  basis  of  a  similarity  matrix  S  "tries  to  achieve  a  weak  clustering 
subject  to  an  objective  criterion  specified  by  the  clustering  method.   If 
the  method,  starts  with  a  weak  cluster  and  tries  to  achieve  a  strong 
cluster,  the  method  is  known  as  agglomerative  (divisive). 

An  Illustrative  Hierarchical  Clustering  Tree 


strong 
cluster 


weak 
cluster 


Index  of  Slmiliarltv 


Hierarchical  methods  can  be  used  to  cluster  observations  or  variables. 
Jollife  (1973)  reports  several  methods  that  may  be  used  to  discard  re- 
dundant variables  in  principal  components  analysis.   Since  the  simlliarity 
measure  contains  n(n-KL)/2  elements,  hierarchical  methods  have  usually  been 
applied  to  samples  with  less  than  400  observations.  Johnson  (1967)  has 
programmed  two  methods  of  cluster  analysis,  quite  similar  to  the  method 
of  single  linkage  and  complete  linkage  discussed  in  Sokal  and  Sneath  (1963) 
which  are  monotonically  invariant  under  scaling.   The  similarity  measure 
(S. .)  is  derived  from  utilization  of  the  Ultra-metric  inequality 

d(x,z)  «  d(x,y)  +  d(y,z)  (10) 

which  is  then  minimized 

d([x,y],z)  «*  min[d(x,z),  d(y,z)j        (11) 
and  is  referred  to  as  the  minimum  (single  linkage)  method.  When 

<H[x,y],z)  -  max[d(xsz),d(y,z)]         (12) 

is  maximized,  it  is  the  maximum  (complete  linkage)  method.  Marketing  and 
psychological  applications  have  used  fhese  two  methods  with  ordinal  data 
when  the  use  of  a  distance  measure  (usually  Euclidean)  has  been  untenable. 
Hubert  (1974)  has  generalized  the  complete  linkage  and  single  linkage  methods 
through  graph  theory.  His  approach  offers  the  capability  of  overlapping 
clusters  and  asymmetric  similarity  measures.   It  would  appear  that  the 
subjective  decisions  in  clustering  will  diminish  as  the  graph  theoretical 
approach  gives  clustering  methods  the  badly  needed  mathematical  foundations 
for  the  derivation  of  clusters.  In  our  opinion,  the  graph  theory  (Hubert  1973, 
1974)  and  tree  structures  (H^rtigan  .1967)  offer  many  advantages  in  the  es- 
tablishment of  a  mathematical  foundation  for  clustering. 


10 


Nonhierarchical  clustering  methods  have  been  developed  to  cluster  n 
entities  into  g  groups  when  g  is  unknown.  MacQueen  (196?) t   Friedman  and 
Rubin  (1967),  and  Ball  and  Hall  (1965)  have  provided  the  early  work  in  this 
area.  Differences  between  the  algorithms  are  generally  in  the  generation 
of  the  initial  configuration,  in  the  criterion  which  is  maximized  or  min- 
imized to  obtain  the  "best"  partitions,  and  in  the  method  of  determining 
the  nvimber  of  clusters  that  exist  in  the  data.   Recently  McRae  (1973)  has 
developed  a  procedure  (Mikca) ,  which  encompasses  many  of  the  concepts  of 
non-hierarchical  methods  and  will  be  discussed  in  some  detail  in  this 
paper.   Given  a  data  matrix  X  and  assuming  g  is  known  the  total  cross- 
products  matrix  can  be  decomposed  as 

T  -  x'x  -  W  +  B  (13) 


where : 


go       _   f 


k=l  ±**1 


is  E  n  X,  X, 
k-1  m^^ 


n  is  the  number  of  observations  in  the  mth  cluster 
m 

g  is  the  number  of  clusters 

X.,  is  the  ith  observation  ector  in  the  kth  cluster. 

The  procedure  generates  g  points  in  the  space  and  on  the  basis  of  a  choice 
of  an  objective  criterion  from  among  the  following,  develops  clusters. 

1.  Minimize  the  |w|.  Wilk'a  lambda  is  A  *  |w|/|l|.  Since  |t|  is 
fixed,  minimizing  jwj  results  in  small  A' which  indicates  large 
differences  between  groups. 

2.  Minimize  trace  W.  Using  this  criteria  minimizes 

Trace  W  -  I       Z     (X^  -  Xfe)  (X^  -  X^  resulting  in 
k*!  i**l 

"minimum"  variance  partitions. 
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3.  Maximize  the  trace  of  W  B.   This  is  Hotelling's  trace 

P 
criteria  which  is  max  Z  A  and  is  derived  from  the 

i=l 
determinantal  equation  JB-AWJ  *  0. 

4.  Maximize  the  largest  root  of  W  ^B.  This  was  proposed 
by  S.  N.  Roy  since  when  A  is  large,  large  differences 
exist. 

The  cluster  analyst  specifies  the  number  of  g  groups  desired  and  g 
points  are  randomly  dispersed  within  the  space.  On  the  basis  of  the 
criterion  specified,  an  Initial  grouping  of  the  entities  is  obtained,  mean 
vectors  are  calculated  and  the  procedure  proceeds  in  an  iterative  manner 
until  the  selected  criterion  converges.   Once  the  "final"  form  clusters 
are  obtained,  they  may  be  described  by  a  linear  discriminant  function 
using  the  derived  clusters  as  the  a  priori  specified  groups.  However, 
if  the  within-group  variance-covariance  matrices  are  not  equal  across 
groups,  the  linear  discriminant  function  is  not  optimal  in  describing 
group  separation.  With  the  widespread  availability  of  multiple 
discriminant  analysis  (MM)  procedures,  a  parametric  clustering  method 
independently  proposed  by  Urbankh  (1972),  Mayer  (1971),  and  Cassetti 
(1964)  should  see  increasing  use  in  non-hierarchical  clustering  appli- 
cations in  marketing.  The  procedure  to  implement  this  algorithm  is 

(1)  Randomly  divide  the  sample  into  g  groups 

(2)  Run  MDA  using  g  groups 

(3)  Classify  the  groups  on  the  basis  of  the  linear  discriminant 
function. 

(4)  Reclassify  the  groups  on  the  basis  of  the  Lachenbruch 
classification  method  to  provide  almost  unbiased  dis- 
criminant functions  (Lachenbruch  and  Mickey,  1968). 

(5)  Switch  misclassification  entities  into  nearest  discrim- 
inant group  smallest  distance  from  group  centroids 
utilizing  Mahalanobis  D2  statistics  to  form  new  pre- 
determined groups. 
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(6)  Repeat  Steps  2  to  5  until  no  entitles  are  misclassified. 

(7)  Repeat  Steps  2  to  6  for  g  +  k  (fc"»l,  2S  .  .  ,=«  n~g)  groups. 

(8)  Select  the  number  of  groups  on  the  basis  of  Rao's  F 
statistic  for  overall  significance.   If  W" *  is  singular, 
the  generalized  psuedo-inverse  can  be  computed  (Theil, 
1971). 

For  the  data  analyst  without  a  computer,,  there  is  no  need  to  dispair. 
McQuitty  (1967,  1970,  1971}  provides  a  clustering  technique  based  on 
hand  computations.  A  discussion  of  the  "quick"  method  of  clustering 
and  step  by  step  directions  for  applications  in  marketing  are  given 
by  Kamen  (1970)  and  an  extension  using  principal  components  analysis 
is  given  by  Aaker  (1971) . 

The  Research  Strategy 

How  can  cluster  analysis  be  used  to  aid  in  the  interpretation  of 
relationships  latent  in  the  data  structure?  This  question  hinges  on 
the  strategy  employed.   Due  to  the  many  implicit  and  explicit  criteria 
that  must  be  specified  or  assumed  in  clustering,  the  technique  can  not 
be  blindly  followed  without  a  great  deal  of  peril.  Methods  of  cluster 
analysis  enable  the  marketing  researcher  to  work  closely  with  his  data. 
Roscoe,  Sheth,  and  Howell  (1974)  have  pointed  out  the  need  for  inter- 
technique  cross-validation  in  the  search  for  invariant  structure  in 
marketing  data.   Since  the  selection  of  the  similarity  measure  and  the 
clustering  algorithm  imposes  a  given  structure  on  the  data,  it  is  re- 
commended that  several  clustering  results  be  compared.   Finally,  Sokal 
and  Rohlf  (1962)  utilized  the  cophenetic  correlation  coefficient  as  a 
measure  of  fit  between  a  derived  similarity  measure  from  a  hierarchical 
structure  and  an  original  similarity  matrix  which  is  the  product -moment 
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correlation  coefficient.   Studies  to  date  (Sammon,  1966;  Sneath,  1965; 
and  McQuitty,  1971),  indicate  that  evaluating  clustering  procedures  is 
not  a  minor  problem  since  some  methods  produce  dissimilar  results  and 
other  methods  produce  comparable  results. 

A  proper  research  strategy  must  encompass  foresight  in'  the  collec- 
tion of  data,  familiarity  with  clustering  decisions,  and  a  firm  grasp 
of  the  research  problem.   Clustering  can  be  used  with  factor  analysis 
to  produce  clearer  factorial  structures,  with  discriminant  analysis 
when  a  priori  groupings  are  unknown,  and  with  multiple  regression  when 
data  structures  are  heterogenous  and  hypothesis  testing  is  the  objec- 
tive.  See  Elton  and  Gruber  (1970),  for  further  discussion  on  this. 

Applications  of  Cluster  Analysis  in  Marketing 

A  widespread  usage  of  cluster  analysis  in  marketing  research  has 
not  occurred  despite  the  suitability  to  many  marketing  problems.  This 
section  of  the  paper  reviews  selected  applications  in  marketing  to  illus- 
trate the  adaptability  of  the  method  to  marketing  problems,   Particular 
problems  described  in  these  studies  should  form  the  basis  for  identifying 
marketing  applications. 

The  subjectivity  of  the  methods  will  be  stressed  and  the  potential 
problems— both  methodological  and  theoretical — in  the  application  of 
cluster  analysis  will  be  enumerated.   It  should,  however,  be  noted  that 
the  subjective  decisions  in  cluster  analysis  can  often  form  the  basis  for 
imaginative  application  of  the  technique  to  marketing  problems  so  long  as 
one  is  aware  of  the  decisions  that  must  be  made.   To  provide  an  intuitive 
perspective  for  the  applied  researcher*  each  review  will  discuss  the  pur- 
pose and  nature  of  the  research  problem,  the  mechanics  of  the  clustering 
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procedures,  and  the  problem  areas  and  decisions  experienced  by  the  researchers 
in  their  particular  applications. 

Test  Market  Selection 

Orderly  classification  of  multidimensional  marketing  phenomenon  is  a 
problem  that  remains  unresolved  in  many  marketing  areas.   Green,  Frank,  and 
Robinson  (1967)  approached  the  problem  of  test  market  selection  through 
numerical  methods  of  cluster  analysis.   They  are  among  the  pioneers  in 
the  application  of  cluster  analysis  to  marketing.   The  research  problem 
and  purpose  of  the  paper  was  to  develop  a  method  of  matching  representa- 
tive test  markets  with  larger  product  markets.   Simultaneous  consideration 
of  a  large  number  of  market  characteristics  were  considered  by  cluster 
analysis  in  an  n- dimensional  metric  space.   This  paper  is  not  only  an 
important  application,  but  is  a  clear  explication  of  a  research  strategy 
by  researchers  aware  of  the  decision  and  subjectivity  problems  inherent 
in  cluster  analysis. 

The  technique  employed  in  the  paper  was  to  measure  distances  among 
test  cities  by  the  familiar  Euclidean  distance  formula, 

1/2 


^ 


z   ex..  -  x.,r 
i=i    *j      ik 


(14) 


to  identify  a  cluster  in  which  cities  within  a  cluster  are  more  similar 

than  cities  between  clusters.   Similarity  was  defined  by  the  distance 

in  the  metric  space.  As  is  often  the  case  in  the  social  sciences,  market 

characteristics  were  measured  in  different  scales  and  therefore  properly 

normalized  to  have  a  mean  of  0  and  a  standard  deviation  of  1.   The  number 

of  cities  to  appear  in  each  cluster  was  specified  in  advance  by  the  prior 

desire  of  the  researchers  to  have  five  cities  in  each  cluster  subject  to 

a  maximum  cutoff  distance  that  precludes  clustering  of  distant  cities 
(points)  in  the  Euclidean  space. 
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The  problem  of  "weighting*'  market  characteristics  with  highly  corre- 
lated measures  was  addressed  by  the  authors  by  first  doing  a  principal 
components  analysis  on  the  data  matrix  and  clustering  on  principal  com- 
ponents scores  for  each  city.  Cluster  analysis  was  performed  on  the 
component  scores  and  the  consequent  distance  measures  (Mahalanobis 
generalized  distance) .   "Implicit"  weighting  of  correlated  measures 
should  be  consciously  considered  and  the  analyst  must  decide  which 
method  of  analysis  is  more  appropriate.  One  method  of  weighting  not 
dealt  with  in  this  paper,  however,  is  "explicit"  weighting  schemes. 
Market  characteristics  more  relevant  to  the  research  problem  can  be 
weighted  by  prior  judgment  rather  than  assigning  either  an  equal  weight 
of  unity  for  each  market  characteristic  or  relying  en  some  statistical 
criterion. 

The  heuristics  of  clustering  methods  may  produce  suboptimal  clusters 
from  a  mathematical  view,  but  when  compared  with  the  simultaneous  assess- 
ment of  multidimensional  data  by  a  market  researcher  the  relevant  ques- 
tion to  ask  seems  to  be:   Does  the  method  aid  the  assessment  of  multi- 
dimensional data?   In  this  paper,  the  answer  was  a  clear  yes  according 
to  the  authors.  However,  see  Morrison  (1967)  for  a  discussion  of 
alternative  procedures  for  calculating  distances. 

International  Marketing 

Establishment  of  world  marketing  segments  based  on  cultural,  socio- 
economical,  and  political  characteristics  is  of  importance  with  the 
growth  of  international  business.   Sethi  (1971)  cluster  analyzed  91 
countries  on  the  basis  of  29  interval  and  ratio  scaled  variables.  The 
objective  of  his  paper  was  to  establish  homogeneous  geographical  markets 
segments. 
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The  method  of  cluster  analysis  employed  consisted  of  the  V-analysis 
(variable  x  variable)  and  the  0~analysis  (object  x  object)  which  are 
subjects  of  the  BC-TRY  system  discussed  earlier  in  the  book.  Variable 
groups  which  have  within-group  similarity  and  between  group  differences 
are  formed  through  V-analysis.   The  first  step  in  V-analysis  is  to  select 
k  sets  of  n  variable  clusters  that  can  reproduce  the  original  matrix  of 
intercorrelations  among  the  variables.   Each  variable  cluster  dimension 

is  defined  by  the  collinear  subset  of  variables  defined  by  an  index  of 

2 
proportionality,  P  ,  as 

2      p       2  p  2     P  •> 
PZ  -  (It     r  Tl  Z  r  Z   r  (15) 

™  xikxjk  k-1  xj*k  k=l  *i\ 

where  p  is  the  index  for  the  number  of  variables.  Unlike  the  principal 
components  analysis,  this  method  factors  common  variance  and  not  the 
total  variance  in  the  data  and  produces  clusters  of  variables  rather 
than  linear  combinations  of  variables.   Since  the  cluster  dimensions 
need  not  be  orthogonal  (uncorrelated) ,  distortions  in  the  distance 
measure  based  on  them  may  occur. 

Object  analysis  is  obtained  by  assigning  a  variable  cluster  score 
to  each  object  usually  on  the  basis  of  a  simple  sum  composite  of  the 
dimensions  of  a  cluster  from  the  V-analysis.   Other  methods  using 
principal  components  scores  are  part  of  the  approach.  While  V-analysis 
and  O-analysis  form  one  method  of  clustering  variables  or  objects, 
disciples  of  this  approach  tend  to  treat  the  BC-TRY  system  as  a  unified 
method  of  data  analysis.   It  must  be  emphasized  that  considerable  sub- 
jectivity and  unresolved  mathematical  problems  still  exist  and  like  all 
clustering  methods  a  heuristic  defined  by  the  program  is  maximized  with- 
out regard  to  any  statistical  sampling  theory. 
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In  Sethi's  paper  four  variable  clusters  termed;   aggregate  production 
and  trade,  personal  consumption,  international  trade,  and  health  and  edu- 
cation were  formed.   0- analysis  produced  eight  country  pattern  types  with 
differing  profile  descriptor  patterns.   Clustering  results  must  be  evaluated 
on  the  basis  of  the  variables  used  to  model  the  research  problem.   Contrary 
to  many  marketing  problems,  the  variables  selected  to  reflect  comparative 
world  markets  did  not  appear  to  be  theoretically  selected.  Unfortunately, 
they  tended  to  be  the  usual  United  Nations  type  of  census  data  which  may 
or  may  not  be  relevant  to  the  marketer  of  a  specific  industry.  However, 
the  paper  does  demonstrate  at  least  one  approach  toward  the  development 
of  international  marketing  segments  and  the  cross-cultural  analysis  of 
world  segments  based  on  a  profile  of  political,  socio-economic,  and 
demographic  measures. 

Buyer  Behavior  and  Personal  Characteristics 

Multidimensional  relationships  between  consumer  characteristics 
and  buying  behavior  are  known  to  be  important  in  identification  of 
market  segments.  Lessig  and  Tollefson  (1971)  desired  to  explore  and 
demonstrate  an  approach  to  segment  market  identification  and  buyer 
behavior  by  assuming  that  consumers  who  exhibit  similar  buying  behav- 
ior and  personal  characteristics  are  likely  to  have  similar  stimulus 
response  functions. 

Cluster  analysis  was  performed  on  20  buying  behavior  variables 
for  212  households.  All  behavior  characteristics  were  given  equal 
importance  in  the  clustering  procedure.  This  was  a  novel  departure 
from  most  cluster  analysis  application  and  was  achieved  by  dividing 
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squared  distances  [_(X.,  -  X  ,  )  J  in  the  distance  formula  for  each  single 
characteristic  by  the  number  of  dimensions  for  that  characteristic.  Av- 
erage within  clusters  distance  (AWCJ  )  was  used  as  a  measure  of  cluster 
similarity. 

Household  personal  characteristics  were  measured  and  related  to  each 
buying  segment  and  for  all  households.  To  test  the  linear  relationship 
between  buying  behavior  and  personal  characteristics,  canonical  correla- 
tion analysis  was  used.  A  stepwise  discriminant  analysis  was  also  per- 
formed for  the  prediction  of  buyer  group  membership  on  the  basis  of 
personal  characteristics.  An  unbiased  estimate  of  its  predictive  val- 
idity was  conducted  on  a  28  household  validation  sample  with  rather 
poor  predictive  results. 

This  paper  represents  an  excellent  example  of  the  complimentary, 
multistage  use  of  multivariate  methods.   However,  the  poor  classifica- 
tion results  in  the  validation  stage  of  the  discriminant  analysis  do 
warrant  some  cautions  that  a  researcher  must  be  cognisant  of ,  if  a 
fruitful  linkage  can  be  made  between  different  multivariate  techniques. 
For  example,  it  is  not  at  all  clear  from,  the  paper  whether  the  poor 
predictive  validation  is  due  to  small  sample  size  or  lack  of  homogeneity 
of  the  within-group  dispersion  matrix  across  the  buyer  segments. 

Personality  and  Implicit  Behavior  Patterns 

The  applicability  of  cluster  analysis  to  marketing  studies  relating 
personality  and  behavior  patterns  is  demonstrated  by  Greeno,  Sommers, 
and  Kernan  (1973) .   Self  theoretical  concepts  of  consumer  behavior  and 
personality  trait  theory  were  associated  with  the  end  result  being  a 
number  of  distinctive  housewife  types  could  be  identified. 
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One  hundred  and  ninety  housewives  between  the  ages  of  30  to  45  year3 
old  were  asked  to  sort  38  product  items  according  to  actual  and  idealized 
behavior.   The  set  of  190  self-ratings  were  then  cluster  analyzed  by  Ward's 
hierarchical  (1963)  clustering  algorithm.   The  similarity  measure  used  was 
the  Euclidean  distance  measure,   Replicable  stability  was  insured  by  con- 
ducting a  separate  analysis  on  two  randomly  split  samples.   Race  and  class 
structure  were  controlled  in  forming  the  samples.   Six  clusters  were  sel- 
ected based  on  the  information  loss  measure  computed  in  the  clustering 
procedure.   Cluster  naming  proceeded  on  the  basis  of  the  cluster  means  and 
the  rank  order  of  the  product  array  in  the  clusters.  Tukey's  test  of  mean 
differences  and  ANOVA  procedures  were  used  to  evaluate  differences  in  the 
clusters.   Socio-economic  and  additional  personality  measures  served  as 
external  measures  to  aid  in  the  interpretation  of  the  results. 

Several  methodological  comments  must  be  made  at  this  point.   Implicit 
in  using  Euclidean  distance  is  the  idea  that  the  variables  (products)  were 
uncorrelated.   This  could  result  in  the  implicit  differential  weighting  of 
products  depending  upon  the  choice  of  product  configurations  as  discussed 
by  Green,  Robinson,  and  Frank  (1967).   Second,  the  relationship  between 
"self"  and  "ideal"  traits  should  have  been  first  analyzed  through  other 
techniques  such  as  the  simultaneous  factor  analysis  or  the  canonical  cor- 
relation analysis.  When  size  allows,  the  idea  of  sample  splitting  is  a 
recommended  procedure.   The  use  of  external  measures  was  interestingly 
incorporated  in  the  paper.   Supportive  validity  of  the  results  would  have 
been  achieved  if  actual  usage  rates  of  the  consumer  products  were  measured. 

Market  Experimentation 

The  results  of  an  experimental  approach  to  test  the  sales  effect  of 
three  different  price  level  changes  in  a  new  food  product  is  reported  by 
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Day  and  Heeler  (1971) .  Their  analysis  Is  concerned  with  the  construction 
of  a  randomized  block  experiment  with  five  strate  composed  of  three  stores 
each  being  representative  of  a  58  stci^  test  market.  A  modified  matching 
coefficient  and  a  modified  Euclidean  distance  measure  were  used  to  con- 
struct a  58  x  58  similarity  measure.   To  reduce  the  redundancy  of  variables, 
factor  analysis  of  the  12  store  attributes  explaining  77  per  cent  of  the 
total  variance  was  accomplished. 

The  subjective  importance  of  each  factor  was  weighed  by  A,  (subjective 
importance  assigned  by  experts)  and  the  modified  distance  measure  was  cal- 
culated by 

di3  ■  <ky\<xik- v>2>1/2  (l6) 

This  is  equivalent  to  stretching  the  dimensionality  of  each  factor  by  its 

subjective  importance.  The  stores  were  iteratively  reassigned  to  clusters 

by  a  hierarchical  clustering  method  and  by  optimizing  the  average  within 

cluster  similarity  subject  to  the  constraint  that,  five  clusters  be  formed 

of  three  or  more  stores. 

Representativeness  and  homogeneity  of  the  strata  were  evaluated  by 

reduced  space  analysis  using  non-metric  scaling  and  principal  components 

analysis.  Representativeness  was  measured  by 

n 
R  =   2  io±^/o±t)/n  (17) 

which  compares  dispersion  across  the  12  store  attributes  for  each  dimension 
i.   This  allows  evaluation  of  the  bias  and  dispersion  produced  by  the 
clustering  approach. 

Several  methodological  problems  arise  in  a  study  of  this  type. 
First,  the  sensitivity  and  the  reliability  of  the  weighting 


21 


scheme  were  not  tested.   The  modified  matching  coefficient  was  developed  to 
link  ordinal-interval-ratio  data  and  by  using  this  measure  on  interval  data 
the  properties  of  the  data  is  not  fully  exploited.   Nevertheless,  the  com- 
parison of  the  two  similarity  measures  and  the  two  clustering  methods  pro- 
vided additional  empirical  support  for  their  study.   The  representativeness 
achieved  by  reduced  space  analysis — whether  metric  or  nonraetric — for  ran- 
domized block,  experiments  offers  promise  for  further  research. 

The  main  disadvantage  of  the  study  is  that  the  original  objective  of 
evaluating  the  effect  of  three  price  level  changes  on  a  new  food  product 
through  a  randomised  block  experimental  approach  was  never  discussed  in 
the  text  of  the  paper. 

Free  Response  Data  Analysis 

Green,  Wind,  and  Jain  (1973)  suggest  using  a  tandem  reduced  space  and 
clustering  approach  in  the  analysis  of  free  response  marketing  data.   Free 
response  marketing  data  is  usually  unstructured  judgments  expressing  like- 
dislike  or  word  association  phrases.   The  purpose  of  this  paper  was  to 
describe  current  limitations  and  methodological  extensions  in  marketing 
of  free  response  data  analysis. 

In  the  first  example,  the  connotations  of  certain  words  for  a  new 
shampoo  among  84  female  respondents  between  the  ages  of  18-30  years  of 
age  were  examined  to  find  out  the  similarity  between  eight  stimulus  words 
and  evoked  word  associations.  An  8  x  19  word  association  frequency  matrix 
was  obtained  in  which  the  column  entries  were  conditional  responses  to 
the  raw  stimuli,   A  hybrid  version  of  Kruskal's  M-D-Scale  V  scaling 
algorithm  was  applied  to  the  word  association  matrix  and  five  dimensions 
were  required  to  obtain  an  "adequate"  fit  cf  the  model  to  data.   The  19 
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evoked  stimulus  words  i?ere  positioned  in  the  common  reduced  space. 

Since  the  results  were  not  easily  interpreted  and  the  configuration 
was  nonunique,  Euclidean  interpoint  distances  were  calculated  in  the  five 
dimensional  space  for  the  19  x  19  dissimilarity  matrix,  A  hierarchical 
tree  structure  form  of  cluster  analysis  was  then  applied  on  the  Euclidean 
dissimilarity  matrix  to  determine  the  word  association  relationship  between 
the  stimulus  phrases  and  evoked  words.   A  second  illustration  in  the  study 
dealt  with  well-known  women's  home  service  magazines  and  the  107  respondents 
were  media  buyers  for  41  different  advertising  agencies.  Respondent  pro- 
tocols were  analyzed  to  obtain  frequency  of  evaluated  type  words  and/or 
phrases.   Further  examples  are  illustrated  and  application  areas  listed. 

The  approach  is  interesting  as  one  way  of  handling  free  response  data 
and  the  use  of  cluster  analysis  provided  a  powerful  way  to  aid  in  the  in- 
terpretation of  a  multidimensional  scaling  solution.  While  the  results 
are  exploratory,  little  can  be  said  about  the  stability,  reliability,  or 
feasibility  of  using  the  results  in  a  marketing  decision  context.   Graph 
theoretic  clustering  approaches  proposed  by  Hubert  (1973)  seems  to  offer 
another  structural  approach  that  can  be  non-metric  and  capable  of  asym- 
metric clustering  of  free  response  data. 

A  Method  of  "Quick"  Cluster  Analysis 

For  the  researcher  without  a  technical  background  in  multivariate 
analysis,  the  method  of  "quick"  cluster  analysis  developed  by  McQuitty 
(1968,  1971)  is  elaborated  and  applied  in  market  research  by  Kamen  (1970). 
In  this  paper,  quick  clustering  is  viewed  as  a  first  approximation  to 
the  reality  of  a  complex  world.   Emphasized  is  the  research  methodology 
and  a  solid  understanding  of  the  research  area. 
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The  method  begins  with  a  matrix  of  similarity  coefficients,  usually 
correlation  coefficients.  The  clustering  strategy  is  summarized  in  the 
following  seven  steps: 

(1)  The  highest  correlation  in  each  column  of  the  similarity 
matrix  is  identified. 

(2)  The  highest  element  in  the  similarity  matrix  is  selected 
as  the  nucleus  of  the  first  cluster. 

(3)  Any  other  object  having  its  highest  correlation  with 
either  one  of  the  two  entities  in  the  first  cluster 
is  joined  to  that  cluster. 

(4)  Excluding  the  already  clustered  objects  the  next  highest 
correlation  is  selected. 

(5)  Repeat  steps  2  to  3  for  step  4. 

(6)  Repeat  steps  4  to  5. 

(7)  Examine  your  results. 

In  the  event  of  a  tie,  sum  the  correlations  in  each  column  with  the  highest 
sum  having  priority.   One  application  of  the  quick  clustering  procedure 
related  to  consumer  opinions  of  gasoline  stations* 

External  criteria  can  often  be  used  to  validate  and  aid  in  the  inter- 
pretation of  the  cluster  analysis  res  Lts  which  is  suggested  by  Kamen.  As 
a  first  approach,  quick  clustering  has  several  definite  advantages  over 
the  more  complicated  heuristic  approaches.  Directly  working  with  the  data 
enables  the  researcher  to  understand  and  to  conceptualize  his  findings 
better.  Unless  one  is  familiar  with  the  mechanics  of  the  clustering 
methods,  analytical  results  may  be  overinterpreted  or  even  misinterpreted. 
In  summary,  this  paper  argues  for  simplicity  in  clustering  rather  than  the 
complexity  normally  associated  with  a  multivariate  method.   It  should  only 
be  viewed  as  a  first  step,  but  the  approach  is  worth  the  time  and  effort. 
Aaker. (1971)  has  extended  this  approach  by  plotting  points  in  a  principal 


24 


component  space  and  selecting  clusters  by  a  similar  approach.   This  reduces 
the  dimensionality  of  the  problem.   Computationally  better  methods  of  cluster 
analysis  have  been  developed  from  which  to  make  informed  judgments.   When 
weighing  the  advantages  and  costs  of  implementation,  the  market  researcher 
with  limited  knowledge  or  interest  in  mathematical  methods  might  well  con- 
sider this  approach. 

Some  Concluding  Observations 

Cluster  analysis  is  an  important  addition  to  the  family  of  multivar- 
iate techniques  and  to  marketing  methodology.   This  paper  has  attempted 
to  summarize  and  introduce  the  concepts  that  are  essential  for  proper 
application  of  the  method  to  marketing  research.  While  we  reviewed 
some  selected  applications  in  this  paper,  they  are  not  all  inclusive 
of  the  substantive  areas  where  the  technique  may  be  applied. 

Subjective  decisions  in  cluster  analysis  should  be  viewed  as  a 
challenge  for  innovation  and  not  as  an  impediment  to  its  use  in  provid- 
ing understanding  of  complex  multidimensional  marketing  problems  when 
groups  are  not  a  priori  known.  The  ability  to  deal  with  the  major  de- 
cisions in  cluster  analysis  and  awareness  of  the  problem  areas  is  a 
first  step  in  the  orderly  classification  of  multivariate  marketing 
phenomena . 
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Table  2,   Some  Methods  of  Cluster  Analysis 


I.   Hierarchical  Analysis 

1.  Agglomerative 

2.  Divisive 

3.  Tree  Structures 

4.  Graph  Theory 

II.  Non-Hierarchical  Analysis 

1.  Minimum  Variance  Partitioning 

2.  Discriminant  Clustering 

3.  Centroid  Clustering 
111 •   Other  Methods 

1.  Obverse  Factor  Analysis 

2.  Key  Cluster  Analysis 

3.  Pattern  and  Mixture  Analysis 

4.  Typal  and  Linkage  Analysis 


Johnson  (1967),  Ward  (.1963)  ,  Gruvaeus 

and  Wainer  (1972) 

Edwards  and  Cavalli-Sforza  (1965) 

Hartigan  (1967) 

Hubert  (1974) 

Friedman  and  Rubin  (1967) 
MacQueen  (1967),  McRae  (1973) 
Urbankh  (1972),  Mayer  (1971) 
Ball  and  Hall  (1965) 

Harraan  (1967) 

Tryon  and  Bailey  (1970) 

Wolfe  (1971) 

McQuitty  (1967,  1968,  1970,  1971) 
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